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Abstract. This paper presents a decidable characterization of tree languages that can 
be defined by a boolean combination of Hi sentences. This is a tree extension of the Simon 
theorem, which says that a string language can be defined by a boolean combination of Ei 
sentences if and only if its syntactic monoid is ^-trivial. 



Logics for expressing properties of labeled trees and forests figure importantly in several 
different areas of Computer Science. This paper is about logics on finite trees. All the logics 
we consider are less expressive than monadic second-order logic, and thus can be captured 
by finite automata on finite trees. Even with these restrictions, this encompasses a large 
body of important logics, such as variants of first-order logic, temporal logics including 
CTL* or CTL, as well as query languages used in XML. 

One way of trying to understand a logic is to give an effective characterization. An 
effective characterization for a logic C is an algorithm which inputs a tree automaton, and 
says if the language recognized by the automaton can be defined by a sentence of the 
logic C. Although giving an effective characterization may seem an artificial criterion for 
understanding a logic, it has proved to work very well, as witnessed by decades of research, 
especially into logics for words. In the case of words, effective characterizations have been 
studied by applying ideas from algebra: A property of words over a finite alphabet A 
defines a set of words, that is a language L C A*. As long as the logic in question is no 
more expressive than monadic second-order logic, L is a regular language, and definability 
in the logic often boils down to verifying a property of the syntactic monoid of L (the 
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transition monoid of the minimal automaton of L). This approach dates back to the work 
of McNaughton and Papert [11] on first-order logic over < (where < denotes the usual linear 
ordering of positions within a word). A comprehensive survey, treating many extensions and 
restrictions of first-order logic, is given by Straubing [16J. Therien and Wilke [20 } 1181 [T9] 
similarly study temporal logics over words. 

An important early discovery in this vein, due to Simon p3], treats word languages 
definable in first-order logic over < with low quantifier complexity. Recall that a T. -\ sentence 
is one that uses only existential quantifiers in prenex normal form, e.g. 3x3y X < y. Simon 
proved that a word language is definable by a boolean combination of 51 -| sentences over 
< if and only its syntactic monoid M is J -trivial. This means that for all m, rrf £ M, if 
MmM = Mm' M , then m = m 1 . (In other words, distinct elements generate distinct two-sided 
semigroup ideals.) Thus one can effectively decide, given an automaton for L, whether 
L is definable by such a sentence. (Simon did not discuss logic per se, but phrased his 
argument in terms of piecewise testable languages which are exactly those definable by 
boolean combinations of Zi sentences.) 

There has been some recent success in extending these methods to trees and forests. 
(We work here with unranked trees and forests, and not binary or ranked ones, since we 
believe that the definitions and proofs are cleaner in this setting.) The algebra is more com- 
plicated, because there are two multiplicative structures associated with trees and forests, 
both horizontal and a vertical concatenation. Benedikt and Segoufin p] use these ideas to 
effectively characterize sets of trees definable by first-order logic with the parent-child rela- 
tion. Bojahczyk [2] gives a decidable characterization of properties definable in a temporal 
logic with unary ancestor and descendant operators. Similarly Bojahczyk and Segoufin [3] 
and Place and Segoufin [13] provided decidable characterizations of tree languages definable 
in A2(<) and FOz{<, <h) where < denotes the descendant-ancestor relationship while <h 
denotes the sibling relationship. The general theory of the 'forest algebras' that underlie 
these studies is presented by Bojahczyk and Walukiewicz [6]. 

In the present paper we provide a further illustration of the utility of these algebraic 
methods by generalizing Simon's theorem from words to trees. In fact, we give several such 
generalizations, differing in the kinds of atomic formulas we allow in our H-\ sentences. 

In Section 2 we present our basic terminology concerning trees, forests, and logic. 
Initially our logic contains two orderings: the ancestor relation between nodes in a forest, 
and the depth-first, left-first, total ordering of the nodes of a forest. In Section 3 we describe 
the algebraic apparatus. This is the theory of forest algebras developed in [6]. 

In Section 4 we give our main result, an effective test of whether a given language is 
piecewise testable (Theorem 4.) The test consists of verifying that the syntactic forest alge- 
bra satisfies a particular identity. While we have to some extent drawn on Simon's original 
argument, the added complexity of the tree setting makes both formulating the correct 
condition and generalizing the proof quite nontrivial. We give a quite different, equiva- 
lent identity in Proposition 18, which makes clear the precise relation between piecewise 
testability for forest languages and ^7-triviality. 

In Section 5, we study in detail a variant of our logic in which the binary ancestor 
relation is replaced by a ternary closest common ancestor relation, and prove a version of 
our main theorem for this case. Section 6 is devoted to other variants: the far simpler 
case of languages defined by H-\ sentences (instead of boolean combinations thereof); the 
logics in which only the ancestor relation is present, and in which the horizontal ordering 
on siblings is present; and, since our algebraic formalism concerns forests rather than trees, 
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the modifications necessary to obtain an effective characterization of the piecewise testable 
tree languages. We discuss some directions for further research in the concluding Section 7. 

An earlier, much abbreviated version of this paper, without complete proofs, was pre- 
sented at the 2008 IEEE Symposium on Logic in Computer Science. 



Trees, forests and contexts. In this paper we work with finite unranked ordered trees and 
forests over a finite alphabet A. Formally, these are expressions defined inductively as 
follows: for any a € A, a is a tree. If U , ... , t n is a finite sequence of trees, then t-\ + • • • + t n is 
a forest. If S is a forest and a £ A, then as is a tree. It will also be convenient to have an 
empty forest, that we will denote by 0, and this forest is such that a0 = a and + t = t+0 = t. 
Forests and trees alike will be denoted by the letters S,t, U, ... 
For example, the forest that we conventionally draw as 



When there is no ambiguity we use as instead of a(s). In particular be stands for the tree 
whose root has label b and has a unique child of label C. 

The notions of node, child, parent, descendant and ancestor relations between nodes 
are defined in the usual way. We write X < y to say that X is a strict ancestor of y or, 
equivalently, that y is a strict descendant of X. We say that a sequence y-\, ... ,y n of nodes 
forms a chain if we have y, < y,+i for all 1 </'</!. As our forests are ordered, each forest 
induces a natural linear order on its set of nodes that we call the forest- order and denote by 
<df s , which corresponds to the depth-first left-first traversal of the forest or, equivalently, 
to the order provided by the expression denoting the forest seen as a word. We write <h for 
the horizontal- order, i.e. X </, y expresses the fact that X is a sibling of y occurring strictly 
before y in the forest-order. Finally, the closest common ancestor of two nodes X, y is the 
unique node z that is a descendant of all nodes that are ancestors of both X and y. 

If we take a forest and replace one of the leaves by a special symbol □, we obtain a 
context. This special node is called the hole of the context. Contexts will be denoted using 
letters p, q, r. For example, from the forest t given above, we can obtain, among others, the 
context 

p = a(a + be) + b + c(D + b) . 

A forest S can be substituted in place of the hole of a context p; the resulting forest is 
denoted by ps. If we take the context p above and if S = (b + ca), then 

ps = a(a + be) + b + c(b + ca + b) . 

This is depicted in the figure below. 



2. Notation 




corresponds to the expression 



t = a(a + be) + b + c(a + b) . 
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There is a natural composition operation on contexts: the context qp is formed by 
replacing the hole of q with p. This operation is associative, and satisfies {pq)s = p{qs) for 
all forests S and contexts p and Q. 

We distinguish a special context, the empty context, denoted □. It satisfies Ds = S and 
□p = pD = p for any forest S and context p. 

Regular forest languages. A set L of forests over A is called a forest language. There are 
several notions of automata for unranked ordered trees, see for instance [SJ chapter 8]. They 
all recognize the same class of forest languages, called regular, which also corresponds to 
definability in MSO as defined below. 

Piecewise testable languages. We say that a forest S is a piece of a forest t if there is an 
injective mapping from nodes of S to nodes of t that preserves the label of the node together 
with the forest-order and the ancestor relationship. An equivalent definition is that the piece 
relation is the reflexive transitive closure of the relation 

{(pt, pat) '. p is a context, a is a node, t is a forest or empty} 

In other words, a piece of t is obtained by removing nodes from t while preserving the 
forest-order and the ancestor relationship. We write S ^ t to say that S is a piece of t. In 
the example above, a(a + b) + C is a piece of t. 

We extend the notion of piece to contexts. In this case, the hole must be preserved 
while removing the nodes: 




The size of a piece is the size of the corresponding forest, i.e. the number of its nodes. The 
notions of piece for forests and contexts are related, of course. For instance, if p, q are 
contexts with p < q, then pO ^ QO. Also, conversely, if S ^ t, then there are contexts 
p X q with S = pO and t = qO. 

A forest language L over A is called piecewise testable if there exists n > such that 
membership of t in L is determined by the set of pieces of t of size n or less. Equivalently, L 
is a finite boolean combination of languages {t '. S ^ t}, where S is a forest. Every piecewise 
testable forest language is regular, since given n > 0, a finite automaton can calculate on 
input t the set of pieces of t of size no more than n. 
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Logic. Regularity and piecewise testability correspond to definability in a logic, which we 
now describe. A forest can be seen as a logical relational structure. The domain of the 
structure is the set of nodes. The signature contains a unary predicate P a for each symbol a 
of the label alphabet A, plus possibly some extra predicates on nodes, such as the descendant 
relationship, the forest-order or the closest common ancestor. Let Q be a set of predicates. 
The predicates Q that we use always include (P a ) a( =x; an d equality, hence we do not explicitly 
mention them in the sequel. We use the classical syntax and semantics for first-order logic, 
FO(Q), and monadic second order logic, MSO(fi), building on the predicates in Q. Given a 
sentence (ft of any of these formalisms, the set of forests that are a model for (ft is called the 
language defined by (ft. In particular a language is definable in MSO(<, </,) iff it is regular [8j 
chapter 8]. 

A formula is a formula 3x-| • • • x n 7, where the formula 7 is quantifier-free and 

uses predicates from Q. Initially we will consider two predicates on nodes: the ancestor 
order X < y and the forest-order X < d f s y. Later on, we will see other combinations of 
predicates, for instance when the closest common ancestor is added, and the forest-order is 
removed. 

It is not too hard to show that a forest language L can be defined by a 5I-|(<, <df s ) 
sentence if and only if it is closed under adding nodes, i.e. 

pt G L => pqt e L 

holds for all contexts p, q and forests t. Moreover this condition can be effectively decided 
given any reasonable representation of the language L. We will carry out the details in 
Section I6.ll 

We are more interested here in the boolean combinations of properties definable in 
^l(<> <dfs)- It is easy to see that: 

Proposition 2.1. A forest language is piecewise testable iff it is definable by a boolean 
combination of Z-| (<, <dfs) sentences. 

One direction is immediate as for any forest S, the set of forests having s as a piece is 
easily definable in Z-|(<, < d f s )- For instance the sentence 

3x, y, z, u P a (x) A P a (y) A P b (z) A P c {u) Ax < y Ax < z Ay < dfs z A -n(x < u) A x < dfs u 

defines the language of forests having a(a + b) + C as a piece. 

For the other direction, notice that for any language definable in £■](<, <df s )) by dis- 
ambiguating the relative positions between each pair of variables, one can compute a finite 
set of pieces such that a forest belongs to the language iff it has one of them as a piece. For 
instance the sentence 

3x, y, z, u P a (x) A P a (y) A P b (z) A P c (u) A x < y A x < z A y < dfs z A ^(x < u) 

defines the language of forests having a(a + b) + C, C+ a(a + b) or ca(a + b) as a piece. 

This result does not address the question of effectively determining whether a given 
regular forest language admits either of these equivalent descriptions. Such an effective 
characterization is the goal of this paper: 
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The problem. Find an algorithm that decides whether or not a given regular forest language 
is piecewise testable. 

As noted in the introduction, the corresponding problem for words was solved by Simon, 
who showed that a word language L is piecewise testable if and only if its syntactic monoid 
M(L) is jT-trivial [13]; that is, if distinct elements m,m' always generate distinct two-sided 
ideals. Note that one can test, given the multiplication table of a finite monoid M, whether 
M is ^7-trivial in time polynomial in \M\: for each m ^ m' E M, one calculates the ideals 
MmM and Mm'M and then verifies that they are different. Therefore, it is decidable if a 
given regular word language is piecewise testable. We assume that the language L is given 
by its syntactic monoid and syntactic morphism, or by some other representation, such as 
a finite automaton, from which these can be effectively computed. 

We will show that a similar characterization can be found for forests; although the 
characterization will be more involved. For decidability, it is not important how the input 
language is represented. In this paper, we will represent a forest language by a morphism 
into a finite forest algebra that recognizes it. Forest algebras are described in the next 
section. 

3. Forest algebras 

Forest algebras. Forest algebras were introduced by Bojanczyk and Walukiewicz as an alge- 
braic formalism for studying regular tree languages [6 J . Here we give a brief summary of the 
definition of these algebras and their important properties. A forest algebra consists of a 
pair (H, V) of monoids, subject to some additional requirements, which we describe below. 
We write the operation in V multiplicatively and the operation in H additively, although H 
is not assumed to be commutative. We denote the identity of V by □ and that of H by 0. 
We require that V act on the left of H. That is, there is a map 

(h, v) >->• vh E H 

such that 

w(vh) = (wv)h 

for all h € H and V, w E V. We further require that this action be monoidal, that is, 

n-h = h 

for all h E H, and that it be faithful, that is, if vh = wh for all h E H, then V = w. 

We further require that for every g E H, V contains elements (□ + g) and (g + □) such 

that 

(D + g)h = h + g,(g + D)h = g + h 

for all h E H. Observe, in particular, that for all g, h E H, 

(g + n)(h + D) = (g+h) + D, 

so that the map h H > h + □ is a morphism embedding H as a submonoid of V. 

A morphism a '. (H-\, V-\) —¥ (H2, V2) of forest algebras is actually a pair (7, 5) of monoid 
morphisms 7 : H-\ H2, 6 : V\ ->• Vz such that "i(vh) = 8{v)~j(h) for all h E H, V E V. 
However, we will abuse notation slightly and denote both component maps by a. 

Let A be a finite alphabet, and let us denote by Ha the set of forests over A, and by 
the set of contexts over A. Clearly Ha forms a monoid under +, 1/& forms a monoid under 
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composition of contexts (the identity element is the empty context □), and substitution of 
a forest into a context defines a left action of V& on Ha- It is straightforward to verify that 
this action makes (Ha, V&) into a forest algebra, which we denote A A . If (H, V) is a forest 
algebra, then every map f from A to V has a unique extension to a forest algebra morphism 
a '. A A — > (H, V) such that a(aD) = f(a) for all a € A. In view of this universal property, we 
call A A the free forest algebra on A. 

We say that a forest algebra (H, V) recognizes a forest language L C Ha if there is a 
morphism a '. A A — > (H, V) and a subset X of H such that Z. = a~ 1 (X). We also say that 
the morphism a recognizes L. It is easy to show that a forest language is regular if and only 
if it is recognized by a finite forest algebra. 

Given L C Ha we define an equivalence relation ~^ on Ha by setting s ~/_ s' if and 
only if for every context p £ Va, ps and ps' are either both in L or both outside of L. We 
further define an equivalence relation on also denoted by p ~/_ p' if for all S £ Ha, 
ps ~/_ p's. This pair of equivalence relations defines a congruence of forest algebras on A A . 
The quotient (H/_, !//_) is called the syntactic forest algebra of L. The projection morphism of 
A A onto (Hi, Vi) is denoted a/_ and called the syntactic morphism of L. a\_ always recognizes 
L and it is easy to show that L is regular iff (H/_, Vi) is finite. 

Idempotents and aperiodicity. We recall the well known notions of idempotent and aperi- 
odicity. If M is a finite monoid and m £ M, then there is a unique element e = m n , where 
n > 0, such that e is idempotent, i.e., e 2 = e. If we take a common multiple of these 
exponents n over all m € M, we obtain an integer cj > such that m w is idempotent for 
every m G M. Observe that while infinitely many different values of oj have this property 
with respect to M, the value of nt is uniquely determined for each m £ M. 

Let (H, \/) be a forest algebra. Since we write the operation in H additively, we denote 
powers of h £ H by n ■ h, where n > 0. As noted above, H embeds in V, so any cj > that 
yields idempotents for V serves as well for H. That is, there is an integer to > such that 
V w is idempotent for all V £ V, and 10 ■ h is idempotent for all /7 £ H. 

We say that a finite monoid M is aperiodic if it contains no nontrivial groups. Since the 
set of elements of the form nfm^ for k > is a group, aperiodicity is equivalent to having 
nf° = A77 w+1 for all AT? £ M. In this case we can take to = \M\. All the finite monoids that we 
encounter in this paper are aperiodic. In particular, every ^-trivial monoid is aperiodic, 
because all elements of a group in a finite monoid generate the same two-sided ideal. 

Pieces. Recall that in Section [21 we defined the piece relation for contexts in the free forest 
algebra. We now extend this definition to an arbitrary forest algebra (H, V). The general 
idea is that a context V £ V is a piece of a context w £ V, denoted by V -< w , if one can 
construct a term (using elements of H and V) which evaluates to l/l/, and then take out some 
parts of this term to get V. 

Let (H, V) be a forest algebra. We say V £ V is a piece of w £ V, denoted by V -< W, 
if a(p) = V and a(q) = W hold for some morphism 

a : A A -> (H, V) 

and some contexts p ^ Q over A. The relation ^ is extended to H by setting g -< h if 
= I/O and A7 = l/l/O for some contexts 1/ ^ l/l/. 

As we will see in the proof of Lemma 13. 1[ in the above definition, we can replace the 
term "some morphism" by "any surjective morphism". The following example shows that 



8 



M. BOJANCZYK, L. SEGOUFIN, AND H. STRAUBING 



although the piece relation is transitive in the free algebra A A , it may no longer be so in a 
finite forest algebra. 

Example: Consider the syntactic algebra of the language {abed}, which contains only one 
forest, which in turn has just one path, labeled by abed. The context part of the syntactic 
algebra has twelve elements: an error element oo, and one element for each infix of abed. 
We have 

a -< aa = oo = bd ^ bed 

but we do not have a ■< bed. 

We will now show that in a finite forest algebra, one can compute the relation ^ in 
time polynomial in \ V\. The idea is to use a different but equivalent definition. Let R be 
the smallest relation on V that satisfies the following rules, for all V, v' , 1/1/ , w' E V: 



□ 


Ft 


V 




V 


Ft 


V 




vw 


Ft 


v'w' 


if v Ft v' 


□ + vO 


Ft 


□ + i/O 


if v Ft v' 


vO + D 


Ft 


i/0 + □ 


if v Ft v' 



Lemma 3.1. Over any finite forest algebra the relations Ft and -< are the same. 

In any finite algebra, the relation R can be computed by applying the rules until no 
new relations can be added. This gives the following corollary: 

Corollary 3.2. In any given finite forest algebra, the relation ■< on contexts (also on forests) 
can be calculated in polynomial time. 



Proof of Lemma \3.1\ We first show the inclusion of R in <. Let a '. A A — > (H, V) be any 
surjective morphism. A simple induction on the number of steps used to derive V R 1/1/ , 
produces contexts p < q with a(p) = V and a(q) = w. The surjectivity of a is necessary for 
starting the induction in the case □ /-?!/. 

For the opposite inclusion, suppose V ^ 1/1/ . Then there is a morphism a '. A A — > (H, V) 
and contexts p ^ q such that V = a(p), w = a(q). We will show that a(p) R a(q) by 
induction on the size of p: 

• If p is the empty context, then the result follows thanks to the first rule in the definition of 
R. If p = aD then from p < qit follows that q = q-\ acfe for some contexts q-\ , Cfe and using 
the first three rules in the definition of R we get that □ • a(aD) • □ R a(q-\ ) • a(aO) ■ a(Q2) 
and hence p R q. 

• If there is a decomposition p = p-| P2 where Pi and P2 are not empty contexts, then from 
p < q there must be a decomposition q = Q1Q2 with p-| ^ q-\ and P2 ■< Q2- By induction 
we get that a(Pi) R a(q-\) and a(p2) R "(Cfe)- Then a(p) R a(q) follows by using the 
third rule in the definition of R. 

• Suppose now p = S + D or p = □ + s. We can assume that s is a tree, since otherwise the 
context p can be decomposed as (s-\ + D)(S2 + □). Since S is a tree, it can be decomposed 
as a(p'O), with a being a context with a single letter and the hole below and p' a context 
smaller than p. By inspecting the definition of ^, there must be some decomposition 
q = qo(a(q'0) + q-\) or q = Q (Ql + a(q'O)), with p' < q'. By the induction assumption, 
a(p') R a(q'). From this the result follows by applying rules three, four and five in the 
definition of R. 
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This argument shows that if V < w with respect to a particular morphism a, then V R w 
and consequently V < w with respect to every morphism. Thus we have also established the 
claim made above that the ■< relation on H is independent of the underlying morphism. □ 



4. Piecewise Testable Languages 

The main result in this paper is a characterization of piecewise testable languages: 

Theorem 4.1. A forest language is piecewise testable if and only if its syntactic algebra 
satisfies the identity 

ifv = = vu" (4.1) 

for all U, V E Vi such that V < U. 

The identity (14. If) is illustrated in Figured] 

In view of Corollary 13.21 an immediate consequence of Theorem 14.11 is that piecewise 
testability is a decidable property. 

Corollary 4.2. It is decidable if a regular forest language is piecewise testable. 

Proof. We assume the language is given by its syntactic forest algebra, which can be com- 
puted in polynomial time from any recognizing forest algebra. The new identities can easily 
be verified in time polynomial in |V/J by enumerating all the elements of Vi. □ 
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The above procedure gives an exponential upper bound for the complexity in case the 
language is represented by a deterministic or even nondeterministic automaton, since there 
is an exponential translation from automata into forest algebras. We do not know if this 
upper bound is optimal. In contrast, for languages of words, when the input language is 
represented by a deterministic automaton, there is a polynomial-time algorithm for deter- 
mining piecewise testability [15J. 

In Sections 14.11 and 14.21 we prove both implications of Theorem 14.11 Finally, in Sec- 
tion [5]3j we give an equivalent statement of Theorem 14. 1\ where the relation ^ is not used. 
But before we prove the theorem, we would like to show how it relates to the characterization 
of piecewise testable word languages given by Simon. 

Let M be a monoid. For 171, n € M, we write m C n if m is a — not necessarily 
connected — subword of n, i.e. there are elements f)\, ... , n 2k+ -\ £ M such that 

n = n-[ ■ ■ ■ n 2k n 2k+ -\ m = n 2 n 4 ■ ■ ■ n 2k . 

We claim that, using this relation, the word characterization can be written in a manner 
identical to Theorem l4.lt 

Theorem 4.3. A word language is piecewise testable if and only if its syntactic monoid 
satisfies the identity 

rfm = rf = mrf formQn. (4.2) 

Proof. Recall that Simon's theorem says a word language is piecewise testable if and only 
if its syntactic monoid is ^-trivial. Therefore, we need to show ^-triviality is equivalent 
to (|4.2p . We use an identity known to be equivalent to ^-triviality (see, for instance, [9], 
Sec. V.3.): 

{nmfn = {nm) w = m(nm) w . (4.3) 

Since the above identity is an immediate consequence of (14. 2ft . it suffices to derive (14. 2\ 
from the above. We only show rfm = rf 2 . As we assume m C n, there are decompositions 

n = n-i ■ ■ ■ n 2k n 2k+ <[ m = n 2 n A ■■■ n 2k . 

By induction on /', we show 

rfrij = rf , 

The result then follows immediately. The base / = 0, is immediate. In the induction step, 
we use the induction assumption to get: 

rfn A ■ ■ ■ = rf . 

By applying (|4.3p . we have 



rf = rfn-i ■■■n i 



and therefore 



□ 
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Note that since the vertical monoid V in a forest algebra is a monoid, it would make 
syntactic sense to have the relation C instead of ■< in Theorem 14.11 Unfortunately, the "if" 
part of such a statement would be false, as we will show in Section T4.31 That is why we need 
to have a different relation H on the vertical monoid, whose definition involves all parts of 
a forest algebra, and not just composition in the vertical monoid. 

4.1. Correctness of the identities. In this section we show the easy implication in The- 
orem Uj3 

Proposition 4.4. If a language is piecewise testable, then its syntactic algebra satisfies 
identity f^, 

Proof. Fix a language L that is piecewise testable and let n be such that membership of t 
in L only depends on the pieces of t with at most n nodes. 
We will use the following simple fact: 

Fact 4.5. If r is any context, p ^ q are contexts and t is a forest, then rpt ^ rqt. 

We only show the first part of the identity, i.e. 

ifv = U w for V H u 

Fix V H u as above. By definition of ui, we can write the identity as an implication: 
for k E N, if U k = U k ■ U k then u k ■ V = u k . Let k be as above. Let p < qhe contexts that 
are mapped to V and U respectively by the syntactic morphism of L. By unraveling the 
definition of the syntactic algebra, we need to show that 

rq k pt € L iff rq k teL 

holds for any context f and forest t. Consider now the forests 

rq ik t and rq' k pt for / 6 N . 

As □ ^ p ■< q, thanks to Fact [331 we get 

rq ik f ^ rq ik pt ^ rq (M)k f 

When / is increasing, the number of pieces of size n of rq' k t is increasing. As there are only 
finitely many pieces of size n, for / sufficiently large, the two forests rq' k t and rq^' + ^ k t have 
the same set of pieces of size n. Therefore, for sufficiently large /', the two forests rq' k t and 
rq' k pt have the same set of pieces of size n, and either both belong to L, or both are outside 
L. However, since ai(q k ) = ai{q k q k ), we have 

rq ik t G L iff rq k t € L 

rq' k pt £ L iff rq k pt e L , 

which gives the desired result. □ 
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4.2. Completeness of the identities. This section is devoted to showing completeness 
of the identities: an algebra that satisfies identity (|4.ip in Theorem 14.11 can only recognize 
piecewise testable languages. We fix an alphabet A, and a forest language L over this 
alphabet, whose syntactic forest algebra (/-//_, V[_) satisfies the identity. We will write a 
rather than a/_ to denote the syntactic morphism of L, and sometimes use the term "type 
of s" for the image a(s) (likewise for contexts). 

We write S ~ n t if the two forests S, t have the same pieces of size no more than n. 
Likewise for contexts. The completeness part of Theorem 14. 1 1 follows from the following two 
results. 

Lemma 4.6. Let neN. For k sufficiently large, if two forests satisfy S ~/< s' , then they 
have a common piece f in the same ~ n -class, i.e. 

t < S, t <s' , t ~ n S, and t ~„ s' . 

Proposition 4.7. For n sufficiently large, pat ~ n pt entails a(pat) = a{pt). 

Proof of the completeness part of Theorem \4-l[ Take n as in Proposition 14.71 an d then ap- 
ply Lemma 14.61 to this n, yielding k. We show that S ~/< s' implies S £ L <^=> s' € L, 
which immediately shows that L is piecewise testable, by inspecting pieces of size k. Indeed, 
assume S s', and let f be their common piece as in Lemma 14.61 Since f is a piece of s 
with the same pieces of size n, it can be obtained from S by a sequence of steps where a 
single letter is removed in each step without affecting the ~ n -class. Each such step preserves 
the type thanks to Proposition 14.71 Applying the same argument to s', we get 

a(s) = a(t) = a(s') , 

which gives the desired conclusion. □ 

We begin by showing Lemma [4.6l and then the rest of this section is devoted to proving 
Proposition 14.71 the more involved of the two results. 

Proof of Lemma \4-6[ We begin with the following observation. 

Fact 4.8. Let n £ N and let K be a regular language. There is some constant k, such that 
every t € K contains a piece S € K of size at most k such that S ~ n t. 

Proof of Fact \4~8\ Let /3 '. A A — > (H, V) be a morphism into a finite forest algebra. Let 
m = \H\. There is a k such that every forest S of size greater than k can be written as 
S = QoQl ' ' ' QmS' where S 1 is a forest and the q-, are nonempty contexts: this is because 
every large enough forest contains either a collection of m siblings or a chain of length m. It 
follows that the sequence of values f3(s'), f3(q m s'), f3(q m _-\ q m s'), ... , /3(Qi • • • Q m s') contains a 
repeat, and so we can remove a subsequence of the q-, and obtain a proper piece t of S such 
that /3(s) = /3(f). Thus every forest S has a piece f of size at most k such that /3(s) = /3(f). 

Now let (H, V) be the direct product of the syntactic algebra (Hk, Vk) and the quotient 
algebra A A / ~ n , and let /3 be the product of the syntactic moprhism of K and the natural 
projection onto the quotient by ~ n . If S £ K then there is a piece f of S of size at most k 
such that /3(s) = /3(f). Thus f G K and S ~ n f, proving the Fact. □ 
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We are now ready to prove Lemma 14.61 Fix n £ N. Notice that each ~ n class is a 
regular language and ~ n has finitely many classes. For each ~ n -class K, Fact 14.81 gives a 
constant k^. Let k be the maximum of n and all these k^; we claim the lemma holds for 
k. Indeed, take any two forests S s'. Let t be a piece of S of size at most k with S ~ n t, 
as given by Fact 14.81 Since S ~/ < s', the forest t is also a piece of S 1 . Furthermore since 
implies ~ n (by k > n), we get s' ~ n S ~ n ^ which implies s' ~ n f by transitivity of ~ n . D 

We now show Proposition 14.71 Let us fix a context p, a label a and a forest f as in the 
statement of the proposition. The context p may be empty, and so may be the forest t. 
We search for the appropriate fl; the size of n will be independent of p, a, t. We also fix the 
types V = a(p), h = a(t) for the rest of this section. In terms of these types, our goal is to 
show that vh = Va(a)h. To avoid clutter, we will sometimes identify a with its image a(a), 
and write vh = vah instead of vh = Va(a)h. 

Let S be a forest and X be a set of nodes in s. The restriction of S to X, denoted s[X], 
is the piece of S obtained by only keeping the nodes in X. 

Let S be a forest, X a set of nodes in S, and X E X. We say that X £ X is a vah- 
decomposition of S if: a) if we restrict S to X, remove descendants of X, and place the hole 
in X, the resulting context has type V; b) the node X has label a; c) if we restrict S to X 
and only keep nodes in X that are proper descendants of X, the resulting forest has type h. 

Definition 4.9. A fractal of length k inside a forest S is a sequence x-\ £ Xi ■ ■ ■ Xk € X^ 
of l/a/7-decompositions of S, where X; C X/ + i \ {x, + i } holds for / < k. 

A subfractal is extracted by only using a subsequence 

x h ^ Xj^ ■ ■ ■ Xjj € Xj. 

of the l/art-decompositions. Such a subsequence is also a fractal. 

Lemma 4.10. Let k 6 N. For n sufficiently large, pat ~ n pt entails the existence of a 
fractal of length k inside pat. 

Proof. The proof is by induction on k. The case Ac = 1 is obvious. 

Assume the lemma is proved for k and n and consider the case k + 1 . 

The set of forests which have a fractal of length k is a regular language, call it K. By 
Fact I4.8I applied to K, there is some constant m such that every forest in K has a piece 
that is also in K, and whose size is bounded by m. (In this reasoning, we do not use the 
parameter n of Fact I4.81 so we can call Fact I4.8I with A7 = 0). We can assume without loss 
of generality that 171 > n. In other words, if a forest has a fractal of length k, then it has a 
piece of size at most A77 which has a fractal of length k. This means that if a forest has a 
fractal of length k, then it has a fractal of length k which has at most m nodes (the number 
of nodes in a fractal is the number of nodes in the largest of its l/a/7-decompositions). 

Assume now that pat ^ m pt- By the induction assumption, as A77 > n, we have a fractal 
of length k inside pat. From the previous observation, this fractal can be assumed to be of 
size smaller than m. Hence we obtain a piece of pt which is a fractal of length k inside pt. 
Clearly, this resulting fractal can be extended to a fractal of length k + 1 by taking for Xk+-\ 
all the nodes of pat and for X/< + i the node a. □ 
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Figure 2: Two types of tame fractal. 

Thanks to the above lemma, Proposition 14.71 is a consequence of the following result: 

Proposition 4.11. For k sufficiently large, the existence of a fractal of length k inside pat 
entails vh = vah. 

The rest of this section is devoted to a proof of this proposition. The general idea is as 
follows. Using some simple combinatorial arguments, and also Ramsey's Theorem, we will 
show that there is also a large subfractal whose structure is very regular, or tame, as we 
call it. We will then apply identity (|4.ip to this regular fractal, and show that a node with 
label a can be eliminated without affecting the type. 

A fractal x-\ € X\ ■ ■ ■ € inside a forest S is called tame if S can be decomposed 
as S = qq-i • • • q^s' (or S = qq^ • • • Qi s') such that for each / = 1 , ... , k, the node Xj is part of 
the context q h see Fig. [2j This does not necessarily mean that the nodes x-\ , ... , x^ form a 
chain, since some of the contexts q t may be of the form □ + t. 

Lemma 4.12. Let k £ N. For n sufficiently large, if there is a fractal of length n inside 
pat, then there is a tame fractal of length k inside pat. 

Proof. The main step is the following claim. 

Claim 4.13. Let AT? E N. For n sufficiently large, for every forest S, and every set X of at 
least n nodes, there is a decomposition S = qq-\ ■ ■ ■ q m s' where every context q-, contains at 
least one node from X. 

Proof. Let Y be the smallest set of nodes that contains X and is closed under closest 
common ancestors. If n is chosen large enough, either S[Y] consist of more than m trees, or 
it contains a node having more than m children, or S[Y] contains a chain of length bigger 
than m. We are thus left with three cases: 
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• In the set Y, there is a path y-\ < • • • < y m+ i . For / G {1 , ... , m + 1 }, consider the set of 
nodes 

V, = {z : z > y and z ^ y, +1 }. 

Each set Y\ contains at least one node of X, by definition of the set Y. The decomposition 
in the statement of the lemma is chosen so that context Q/ corresponds to the set Y-,. The 
context q corresponds to all nodes that are not descendants of y-\ , and the forest s' 
corresponds to all descendants of y m+ i . 

• There is a node y € V such that at least m+ 1 children of y have some node from Y (and 
therefore also X) in their subtree. Let t be the forest containing all proper descendants of 

y. By assumption on y, the forest t can be decomposed as t = U A + t m+ -\ so that each of 

the forests contains at least one node from X. For the decomposition in the statement of 
the lemma, we define q to be the set of nodes outside t, which includes y, and we define 
q-, to be tj + □ and s' as t m+ -\ . 

• The forest S can be decomposed as t = t-\ + ■ ■ ■ + t m+ -\ so that each of the forests contains 
at least one node from X. We conclude as in the previous case but with an empty q. Q 

We now come back to the proof of the lemma. For k € N let n be the number defined by 
Claim B~T3l for m = k 2 . Let x-\ € X-\ ■ ■ ■ x n G X n be a fractal of length n inside S = pat. We 
apply Claim I4.13( with X = {x-\ , ... , x n } and obtain a decomposition s = qq-\ ■ ■ ■ q m s' ' . For 
each / = 1 , ... , m the context q; contains at least one node of X. We chose arbitrarily one 
of them and denote it by X nr Unfortunately, the function / 1— > fi/ need not be monotone, as 
required in a tame fractal. However, we can always extract a monotone subsequence, since 
any number sequence of length /c 2 is known to have a monotone subsequence of length k |10j 

□ 

We now assume there is a tame fractal Xi G Xi • • • x^ £ Xk inside S = pat, which is 
decomposed as S = qq-\ ■ ■ ■ q^s' , with the node X/ belonging to the context f)f/. The dual case 
when the decomposition is S = qq k ■ ■ • q-\S r , corresponding to a decreasing sequence in the 
proof of Lemma I4.12| is treated analogously. 

The general idea is as follows. We will define a notion of monochromatic tame frac- 
tal, and show that vah = vh follows from the existence of large enough monochromatic 
tame fractal. Furthermore, a large monochromatic tame fractal can be extracted from any 
sufficiently large tame fractal thanks to the Ramsey Theorem. 

Let /',/, / be such that < / < j < I < k. We define Um to be the image under a of the 
context obtained from Q (+ i ■ ■ ■ qi by only keeping the nodes from X\ (with the hole staying 
where it is). We define Wm to be the image under a of the context obtained from q, + -\ ■ ■ ■ q-. 
by only keeping the nodes from X\\ {X/}. Straight from this definition, as X\ C we have 

Wiji ^ Up and Ujji ^ U ij[M) (4.4) 

A tame fractal is called monochromatic if for all / < j < I and all /' < / < /' taken from 
{1 , ... , k}, we have 

Note that in the above definition, we require j < I, even though Ujji is defined even when 
/</• 

We apply the following form of Ramsey's Theorem (see, for example, Bollobas [7]): Let 
C, r, k be positive integers. Then there exists an integer N with the following property. Let 
\S\ > N, and suppose that the subsets of S of cardinaility r are colored with C colors. Then 
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there exists a subset T of S with \T\ > k such that all subsets of T with of cardinality r 
have the same color. 

Let ui be the exponent associated to the syntactic forest algebra (/-//_, !//_) as defined in 
Section [3j If there is a tame fractal of size N inside S, then the map {/',/, /} i— > Uw gives us a 
coloring of the cardinality 3 subsets of {1 , ... , N} with | V/J colors. By Ramsey's Theorem, 
if N is sufficiently large, there is a monochromatic fractal of length k = ui + 1 inside S. 

We conclude by showing the following result: 

Lemma 4.14. If there is a monochromatic tame fractal of length k = ui + 1 inside pat = 
qq-\ - - q k s', then vah = vh. 

Proof. Fix a monochromatic tame fractal X-\ & X-\ ■ ■ ■ x k £ X k inside a forest S = pat = 
qq-\ ■ ■ ■ qkS 1 . Since x k £ X k is a va^-decomposition, the statement of the lemma follows if a 
assigns the same type to the two restrictions s[X k ] and s[X k \ {X/;}]. 

Recall the definition of Urn and Wm above. The type of the forest s[X k ] can be decom- 
posed as 

a(s[X k ]) = a{q[X k ]) ■ u Q ^ k ■ U^k ■ U 23 k ■ ■ ■ U {k -1)kk ■ a{s'[X k ]) 

The type of s[X k \ {x k }] is decomposed the same way, only U(k-\)kk is replaced by W( k _-\) kk . 
Therefore, the lemma will follow if 

Uo-\k ■ U\2k ■ U23k ' ' ' ^(/c-1)Wc = ^01/( ' ' U 2 3k ' ' ' ^(fc-1)/f/f ■ 

Since the fractal is monochromatic, and since k = uj + 1 the above becomes 

u 0-\k ' u (k-\)kk = u 0\k' w (k-\)kk ■ 
By (|4,4p and monochromaticity we have 

W(/t-1)Mc ^ f(/f-l)/f(/t+l) = ^OU 

l/(/c_i)/c/c ^ W(/c-1)/c(/c+1) = ^OU ■ 

Therefore identity (|4.ip can be applied to show that both sides are equal to Uq^. Note that 
we use only one side of identity (|4.ip , u^v = . We would have used the other side when 
considering the case when S = qq k ■ ■ ■ Qi s'. □ 



co times 
+ ••■ + 







Figure 3: The identity u(vuh) = uj(vuh) + vh, with the white nodes belonging to U. 

4.3. An equivalent set of identities. In this section, we rephrase the identities used in 
Theorem I4.ll There are two reasons to rephrase the identities. 
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The first reason is that identity (14. ip refers to the relation V < 1/1/ . One consequence 
is that we need to prove Corollary 13.21 before concluding that identity (|4.ip can be checked 
effectively. 

The second reason is that we want to pinpoint how identity (|4.ip diverges from J- 
triviality of the context monoid V. Consider the forest language "all trees in the forest are 
of the form aa" . It is easy to verify that the syntactic forest algebra of this language is such 
that V is jT'-trivial. But this language is not piecewise testable, since for any k > 0, the 
forests k ■ aa and k ■ aa+ a contain the same pieces of size at most k, but the first of these 
forests is in the language, while the second is not. 

The proposition below identifies an additional condition (depicted in Figure [3|) that 
must be added to ^-triviality. 

Proposition 4.15. Identity lj[4.1\ ) is equivalent to J -triviality of V, and the identity 

vh + u) ■ vuh = uj ■ vuh = u) ■ vuh + vh (4.5) 

Proof. One implication is obvious: both jT'-triviality and (|4.5f) follow from (|4.ip . For the 
other implication, we assume V is j7"-trivial and that (14. 5p holds. We must show that if 
V < U, then 

ifv = if = Vlf . 

We will only show the first equality, the other is done the same way. By unraveling the 
definition of V ■< U, there is a morphism 

a : A A ->■ (H, V) 

and two contexts p ■< q over A such that a(p) = V and a(q) = U. 
The proof goes by induction on the size of p. 

If p can be decomposed as Pi P2 with p-| , P2 nonempty, then we have Pi ■< q and p 2 ^ q 
and, by induction, al^q)^ ■ a(pi) = a(q) w , a(<7) w ■ a(p2) = a(q) w ■ Hence we get: 

a(qr ■ «(Pi) ■ a(p 2 ) = a{qY • a(p 2 ) = a{qT . 

If p consists of single node with a hole below, then we have q = Q0PQ1 fo r some two 
contexts qo,q-\, and therefore also U = UoVU-\ for some Uq,U-\. The result then follows by 
^-triviality of V (recall that jT-triviality implies identity (|4.3|) ): 

U w y = (U VU- [ ) U1 V= (UqVU^ u v = {u vu^Y = if . 

In the above, we used twice identity (|4.3p : Once when adding Uq to U^, and then when 
removing UqV from after . 

The interesting case is when p = □ + s for some tree S. In this case, the context q can 
be decomposed as Qi(D + f)92> with S -<t. We have 

u u) v = a(qi (□ + t)q 2 ) w a{\3 + s) . 

Thanks to identity (|4.3p . the above can be rewritten as 

if v = afa (□ + t)q 2 Y{a{n + f)) w a(D + s) . 

Notice now that 

(a(D + Wa(a + s) = (□ + a(s) + u • a{t)) . 
It is therefore sufficient to show that s ^ t implies 

uj ■ a(t) = a(s) + uj ■ a(t) . 
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The proof of the above equality is by induction on the number of nodes that need to be 
removed from t to get S. The base case S = t follows by aperiodicity of H, which follows 
by aperiodicity of V, itself a consequence of jT-triviality. Consider now the case when t is 
bigger than S. In particular, we can remove a node from t and still have S as a piece. In 
other words, there is a decomposition t = f such that S -< q^t 1 . Applying the induction 
assumption, we get 

u ■ a(q Q f) = a(s) + uj ■ a(q Q f) . 
Furthermore, applying identity (|4.5p . we get 

uj ■ a(t) = a(qo?) + uj ■ a(t) = uj ■ a(q Q f) + uj ■ a(t) . 
Combining the two equalities, we get the desired result. □ 

5. Closest common ancestor 

According to the definition of piece in Section [21 f = d(a + b) is a piece of the forest 
S = dc(a+ b). In this section we consider a notion of piece which does not allow removing 
the closest common ancestor of two nodes, in particular removing the node C in the example 
above. The logical counterpart of this notion is a signature where the closest common 
ancestor (a three argument predicate) is added. 

Recall that in a forest S we say that a node z is the closest common ancestor of the 
nodes X and y, denoted z = X n y, if z is an ancestor of both X and y and all other nodes 
of S with this property are ancestors of z. Note that the ancestor relation can be defined 
in terms of the closest common ancestor, since a node X is an ancestor of y if and only if 
X is the closest common ancestor of X and y. We now say that a forest S is a cca-piece 
of a forest t, and write this as S < t, if there is an injective mapping from nodes of S to 
nodes of t that preserves the label of the node together with the forest-order and the closest 
common ancestor relationship (the ancestor relationship is then necessarily preserved). An 
equivalent definition is that the cca-piece relation is the reflexive transitive closure of the 
relation 

{(pt,pat) '. p is a context, a is a node, t is a tree or empty} 

Notice the difference with the notion of piece as defined in Section [2j where t could be an 
arbitrary forest. Similarly we say that a context p is a cca-piece of the context q, p < q, if 
there is an injective mapping from p to q as above that also preserves the hole. 

A forest language L is called cca-piecewise testable if there exists n > such that 
membership of t in L depends only on the set of cca-pieces of t of size n. 

As before, every cca-piecewise testable language is regular and an analogue of Proposi- 
tion [2J] holds as well. 

Proposition 5.1. A forest language is cca-piecewise testable iff it is definable by a Boolean 
combination o/Zi(n, <dfs) formulas. 

Recall that the ancestor relation can be expressed using the closest common ancestor 
relation hence Hi(n, <df s ) could be replaced by Zi(n, <dfs> <) in the statement of Proposi- 
tion I5.ll A first remark is that there are more cca-piecewise testable languages than there 
are piecewise testable ones. Hence the identities that characterize piecewise testable lan- 
guages are no longer valid. In particular, in the syntactic algebra of a cca-piecewise testable 



PIECEWISE TESTABLE TREE LANGUAGES 



19 



language, the context monoid V may no longer be JT-trivial. To see this consider the lan- 
guage L of forests over {a,b,c} that contain the cca-piece a(b + c). This is the language 
"some a is the closest common ancestor of some b and C". Then, for all A7, the context 
p = (ab) n D is not the same as the context q = (ab) n aU as p(b + c) & L while q{b + c) € L. 
Hence the identity (uv)^ = (uv^u does not hold in the syntactic context monoid of L. 
However as we noted earlier, any j7"-trivial monoid satisfies this identity. Note however that 
p and q satisfy the equivalence pt € L iff qt € L for all trees t. The characterization below 
is a generalization of this idea of distinguishing trees from forests. 

We call a context a tree-context if it is nonempty and has one node that is the ancestor 
of all other nodes, including the hole. 

In the presence of the closest common ancestor, the algebraic situation is more com- 
plicated as well: cca-piecewise testability of a forest language L is not determined by the 
syntactic forest algebra alone. To obtain an algebraic characterization of this class of lan- 
guages, it is necessary to look at the syntactic morphism a\_ '. A A — > (Hi, !//_) that maps each 
(h, v) to its ~/_-class, and not just the the image of this morphism. (We can be considerably 
more precise about this: The distinction is that the cca-piecewise testable languages do not 
form a variety of languages in the sense described by Eilenberg [9]. In particular, this family 
of languages lacks the crucial property of being closed under inverse images of morphisms 
between free forest algebras; this fails if the morphism maps some generator aD to the 
empty context, or to a context of the form p + S, where p is a context and S is a nonempty 
forest. However cca-piecewise testable languages satisfy all the other properties of varieties 
of languages and in particular they are closed under inverse images of homomorphisms that 
are "tree-preserving", i.e., the image of aD is a tree-context p for all a. Varieties of forest 
languages are discussed in [4].) 

We extend the cca-piece relation to elements of a forest algebra (H, V) in the presence 
of a morphism a '. A A — > (H, V) as follows: we write V < w if there are contexts p < q that 
are mapped to V and w respectively by the morphism a. There is a subtle difference here 
with the definition of ^ defined in Section [2j the < relation on V depends on the morphism 
a\ Similarly we define the notion of g < h for g, h G H. 

The elements of V that are images under the morphism a of a tree-context are called 
tree-context-types. Similarly, the elements of H that are images of a tree are called tree- 
types (it is possible for an element to be an image of both a tree and a non-tree, but it is 
still called a tree- type here). Note that the notions of tree- type and of tree-context- type 
are relative to a. 

Theorem 5.2. A forest language L is cca-piecewise testable if and only if its syntactic 
algebra and syntactic morphism satisfy the following identities: 

u w h= u u) vh= vu^h (5.1) 

whenever h is a tree-type or empty, and V < u are tree- context-types, and 

uj ■ h = uj ■ h + g = g + u ■ h ifg^h (5.2) 

Because of the finiteness of the syntactic forest algebra (Hi, !//_) one can effectively 
decide whether an element of one of these monoids is the image of a tree-context or of a 
tree. Whether or not V < U or g < h holds can be decided in polynomial time using an 
algorithm as in Corollary 13.21 based on the following equivalent definition of <: Let (H, V) 
be a forest algebra and a a surjective morphism from A A — > (H, V). Let then R be the 
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smallest relation on V that satisfies the following rules, for all V, v' , w, w' € V: 

if v R v' 

if V R V 1 and w R w' and w, w' are tree-context-types 
if V R V 1 and w R w' and V, v' are of the form (s + □ + t) 
if v R v' 
if v R v' 

Lemma 5.3. For any finite (H, V) and surjective morphism a, the relations R and < are 
the same. 

Proof. We first show the inclusion of R in <. A simple induction on the number of steps 
used to derive V R 1/1/ , produces contexts p < Q with a(p) = 1/ and a(<7) = l/l/. Moreover p (q) 
is a tree-context whenever U (v) is a tree-context-type. The surjectivity of a is necessary 
for starting the induction in the case □ R V. 

For the inclusion of < in R, we show that a{p) R a(q) holds for all contexts p <! <7- The 
proof is by induction on the size of p: 

• If p is the empty context, then the result follows thanks to the first rule in the definition 
of R. If p = aD then from p < q it follows that q = q-\ aqi for some contexts q-\ , Q2 and 
using the first and second rule in the definition of R we get that □ R ct(<7i), □ R d(C/2), 
and a(a) Ra(a)a(q2) ■ Hence using the third rule in the definition of R we get the desired 
result by composition. 

• If there is a decomposition p = Pi ap2 where pi , P2 are contexts, then from p < q there 
must be a decomposition q = q-\ acfe with pi < Q1 and P2 < Q2- By induction we get that 
a(p-i) f? a(<7i) and a(p2) o(<72)- Applying the second rule to the latter we get that 
a(ap2) R a(aq2)- We can now apply the third rule to derive a(p) R a(q). 

• If there is a decomposition p = P1P2 where Pi , P2 are non empty contexts and Pi is of the 
form (s + □ + f), then from p < q there must be a decomposition Q = Q1Q2 with p-| < 
and P2 < Q2 and where Q1 is of the form (s' + □ + We conclude by induction and using 
the fourth rule in the definition of R. 

• The remaining case is when p = (t + □) (or p = □ + 1) where t is a tree of the form ap'O for 
some context p'. Then from p < q we have q = aq'O + q-\ for some contexts q-\,q' , with 
p' <\q' '. By induction we have a(p') R a(q'). Using the second rule we get a(ap') R a(aq'). 
Using the last rule we get a(p) R a(aq'0 + □). By the first rule we have □ R a(q-\). We 
conclude using the fourth rule. □ 

This implies that Theorem I5.2I yields a decidable characterization of the cca-piecewise 
testable languages. 

Corollary 5.4. It is decidable if a regular forest language is cca-piecewise testable. 

The proof of Theorem I5.2I follows the same outline as that of the proof of Theorem I4.1| 
but the details are somewhat complicated. 

5.1. Proof of Theorem 15.21 The proof that (|5.ip and ()5.2[) are necessary is the same as 
Section 14.11 The only difference is that instead of Fact 14.51 we use t ne following. 

Fact 5.5. If r is any context, p< q are tree-contexts, and f is a tree or empty, then rpt<rqt. 
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We now turn to the completeness proof in Theorem I5.2L The proof is very similar to 
the one of the previous section, with some subtle differences. 

As before, we fix a language L whose syntactic forest tree algebra (H, V) satisfies all the 
identities of Theorem 15.21 We write a for the syntactic morphism. 

We now write S ~ n t if the two forests S, t have the same cca-pieces of size n. Likewise 
for contexts. 

The main step is to show the following proposition. 

Proposition 5.6. For n sufficiently large, if t is a tree or empty, then pat ~ n pt entails 
a(pat) = a(pt). 

Theorem 15.21 follows from the above proposition in the same way as Theorem 14.11 follows 
from Proposition l4.7l in the previous section. The reason why we assume that t is either a tree 
or empty is because when S is an cca-piece of s' , then S can be obtained from s' by iterating 
one of the following two operations: removing a leaf, or removing a node which has only 
one child. Hence during the pumping argument yielding Theorem 15.21 from Proposition 15.61 
it is enough to preserve the type only for these operations. We thus concentrate on showing 
Proposition 15.61 

We will now redefine the concept of fractal for our new, closest common ancestor setting. 
The key change is in the concept of a l/a/7-decomposition. We change the notion of X G X 
being a l/a/7-decomposition of S as follows: all conditions of the old definition hold, but new 
conditions are added. First we require that S[X] be a closest common ancestor piece of S, 
in particular this implies that if two elements of X have a closest common ancestor in s 
then this closest common ancestor is also in X. Moreover either X has no descendants in 
X; or there is a minimal element of X that has X as a proper ancestor. In other words, the 
part of s[X] that corresponds to h is either empty, or is a tree. In particular, s[X \ {x}] is 
a closest common ancestor piece of S[X]; which is the key property required below. From 
now on, when referring to a va^-decomposition, we use the new definition. In particular in 
the concept of a fractal X-\ G X\ , ... , G X^ inside S we now have that for each /', X\ G X; is 
a va/7-decomposition of S in the new sense. 

The proof of the following lemma is exactly the same as its counterpart in Section 14.21 
(Lemma 14. 10|) and is therefore omitted. 

Lemma 5.7. Let k G N. For n sufficiently large, if Ms a tree or empty, then pat ~ n pt 
entails the existence of a fractal of length k inside pat. 

A fractal x-\ G X^ ■ ■ ■ , X/< G X^ inside S is called cca-tame if S can be decomposed as 
S = qq-i ■ ■ ■ q^s' (or S = qq^ ■ ■ ■ q^\ S 1 ) such that x-t G q\ , ■ ■ ■ , x^ G q^ and such that either: 

• Each q-j is a tree context whose root node belongs to X\ \ {x,}. 

• Each Q/ is a context of the form □ + tj, with tj a forest. 

Lemma 5.8. Let k G N. For n sufficiently large, if there is a fractal of length n inside pat, 
then there is a cca-tame fractal of length k inside pat. 

Proof. The proof is essentially the same as for the counter part in Section T4.2I (Lemma 14. 12] ): 
only this time we need to be more careful to satisfy the more stringent requirements in a 
cca-tame fractal. 

Let m = 2k + 2. Using the same reasoning as in the proof of Lemma |4.12| if n is large 
enough then we may extract a subfractal of length m where either: 
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• All the nodes x-\, ... ,x m have the same closest common ancestor. In this case, we can 
extract a cca-tame sub fractal, where each context is of the form □ + t,. 

• The set Y = {y : y is a closest common ancestor of some X,, Xy} contains a chain y-\ < 
• • • < ym, such that for each / < m, the set Y; = {z : z > y and z ^ y, + i } contains at 
least one of the node X,. (There is a second case, where the nodes y-\ , ... , y m are ordered the 
other way: with y, + i an ancestor of y t . This case is treated analogously.) In particular, y t 
is the closest common ancestor of X, and any of the nodes x /+ -| , ... , x m . Since X l+ -\ contains 
both Xj and x /+ i , each node y, belongs to the set X l+ \ . As we may have X, = y, the desired 
cca-tame fractal is obtained as follows: We use X2 G X2,X\ G X4, ...,X2k G as the 
fractal (recall that m = 2k + 2); while the decomposition qq-\ ... q k s' is chosen so that q, 
has its root in y^\-\ , and its hole in y2/+i . □ 

Recall the definition of Urn and l/l/,y/ as the image under a of the context obtained from 
Q/+1 • • • 9/ by restricting s to Xj and Xj \ {X/}, respectively. Note that because of the new 
definition of fractals we have: 

Wjji<Ujji and Ujji<U ij{M) (5.3) 
if the q, are tree-contexts then U,ji, Wjji are tree-context-types (5-4) 
The definition of monochromaticity is the same as in the previous section and Ramsey's 
Theorem gives. 

Lemma 5.9. If there is a cca-tame fractal of sufficiently large size inside pat, then there is 
a monochromatic cca-tame fractal of size m = uj + 2 inside pat. 

We will now take a monochromatic cca-tame fractal, and conclude by showing that 
a(pat) = a(pt). 

Lemma 5.10. If there is a monochromatic cca-tame fractal of size u + 2 inside pat, then 
vah = vh. 

Proof. Fix a monochromatic cca-tame fractal of size m = u) + 2 and let k = m — 1 . Since 
X/( G Xk is a l/a/7-decomposition, the statement of the lemma follows once we show that a 
assigns the same type to the forest s[Xj < ] and s[X^ \ {X/c}]. 

Recall that the type of the forest s[Xk] can be decomposed as follows (the case where 
S = qqmqm--\ • • • <7l s' is treated similarly by duality). 

a(s[X k ]) = a(q[X k ]) ■ u QAk ■ u^k • ■ ■ ■ U(k-\)kk ■ a(q m [X k ]s'[X k ]) 

The type of s[X k \ {x k }] is decomposed the same way, only U^k-^kk ls replaced by w^ k _-\^ kk . 
Let h = a(q m [Xk]s'[Xk]) and notice that if q m is a tree-context then h is a tree-type. 
Therefore, the lemma will follow if 

Uo:k ■ U-\2k ■ U23k ■ ■ ■ U(k--\)kk • h = U -\k • U-\2k • UlZk ' ' ' W(k-\)kk ' h . 
Since the fractal is monochromatic, and since k = uj + 1 , the above becomes 

u 01/c ' u (k--\)kk ■ h = ^OU ' w (k-1)kk ' h . 
By (j5.3j) and monochromaticity, we have 

w (k--\)kk . U( k --\)kk ^ "(/c-1)/c(/c+1) = ^01/( - (5-5) 

We now have two cases. If all the q, are tree-contexts, we conclude using identity (I5.ip 
which can be applied because of (I5.5p . and the fact that ft is then a tree- type and (I5.4D . If 
all the Q/ are contexts of the form □ + f h we conclude from (I5.5D using identity (I5.2p . □ 
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5.2. An equivalent set of identities. In this section, we give a set of identities that is 
equivalent to the one used in Theorem 15.21 The rationale is the same as in Proposition 14. 151 
we want to avoid the use of V < w in the identities. 

Proposition 5.11. The conditions on the syntactic morphism stated in Theorem \5.S\ are 
equivalent to the following equalities: 

(uv) u h = {uv) u uh (5.6) 
whenever h is a tree-type or empty, and 

{uv) w = v{uvy (5.7) 
whenever u and V are tree- context-types, and 

(u{D + vwh)Yg = {u{U + vwh)) u u{U + vh)g = (u(D + vh))(u(D + vwh)) u g (5.8) 
whenever U is a tree- context-type or empty and g, h are tree-types or empty. 

The rest of Section 15.21 is devoted to showing the above proposition. 

It is immediate to see that identity (|5.ip implies identity (|5.7p and that identity (|5.ip 
implies identity (|5.8p . We now show that identities (|5.ip and (|5.2|) imply identity (|5.6|) . Let 
U and V be two context-types and h be a tree-type. We want to show that (uv) u h = (uv) u uh. 

We consider several cases. 

• In the first case we assume that U = U\U 2 for some tree-context-type Ui- In that case we 
have: 

{UV) UJ h= (UV) ul (UV) UJ (UV) UJ h= (UiU2VUiU 2 V) u (UiU2V) ul h= Ui(U 2 VU^\u 2 VUiU 2 VU^U2Vh 

Notice now that U 2 v<u 2 vu-\ U 2 VU\ and that u 2 vu-\ U 2 <U 2 vu-\ U 2 vu-\ . As U 2 is a tree-context- 
type, all the context-types involved are tree-context-types and we can use identity (|5.ip 
twice and replace U 2 V by U 2 VU-\U 2 . This yields: 

(uv^h = u-\ (u 2 vu-\ {u 2 vu-\ u 2 vu-\ Yu 2 vu\ u 2 h 

And we have 

( uv)" h = ( u\ u 2 vu: u 2 vu: u 2 v) u u: u 2 h 

By idempotency, this yields the desired result: 

{uv) w h = {uv) w uh 

• The second case, in which we assume that V = V-\V 2 for some tree-context-type v 2 , is 
treated similarly. 

{uvyh = (uv-i v 2 ) u h = {uv-i v 2 ) u {uv^ v 2 fh 

Therefore, 

(uv) u 'h= uvi(v 2 uvi)"~' 1 (v 2 uvi)"v 2 h 

Notice now that V 2 < v 2 uv-\ and that v 2 u < v 2 uv-\. As V 2 is a tree-context-type, all 
the context-types involved are tree-context-types and we can use identity (|5.ip twice and 
replace v 2 by V 2 U. This yields: 

(uvyh = uv-i {v 2 uv : ) u ~ J[ {v 2 uv^Yv 2 uh 

And we have 

(uv^h = {uvy {uvy uh = (uvyuh 



21 



M. BOJANCZYK, L. SEGOUFIN, AND H. STRAUBING 



• When none of the above cases works, we must have U = i\ + □ + f 2 and V = g-\ + □ + g 2 . In 
that case we have (uv) w h = uj-(f-\ +g-\) + h+u-(g 2 + f 2 ), and we conclude using identity (|5.2p 
as f-i < (U + flr-i) and f 2 < (f 2 + flfc). 

We now consider the converse implication in Proposition 15. 1 ll Assume that identities (j5.6j) - 
(|5.8p hold. We show that identities (|5.ip and ()5.2[) are satisfied. 
We first show the following lemma: 

Lemma 5.12. If u is a tree-context-type, V, 1/1/ , w' are (not necessarily tree) context-types 
with w' < l/l/, and g, h are either tree-types or empty, then the following identity holds 

(u{D + vwh)) u g = (u{D + vwh)) u u{D + vw'h)g (5.9) 

Note that the identity (|5.2j) is a direct consequence of the above, by taking u, V to be 
the empty context, and g, h to be the empty tree. We will also use the above lemma to 
show (|5.ip . but this will require some more work. 

Proof. The proof is by induction on the number of steps used to derive w' < 1/1/ . 

• Consider first the case when w, w' can be decomposed as 

W = W-\W 2 w' = w\ w' 2 w\ < W-\ , w' 2 < n/ 2 

Two applications of the induction assumption give us for all tree-type or empty g: 

(u(D + vw-\ w 2 h)) UJ g = (u{D + vw-\ n/ 2 ^)) w u(D + vw-\ w' 2 h)g (5.10) 

(u{D + vw-i w^tfjfg = (u(D + vw-i w^tfjfuiU + vw\ w 2 h)g (5.11) 

As U is a tree-context-type we can iterate on (|5,10p and then apply (|5.1ip in order to 
derive: 

(£/(□ + vw-i w 2 h)) 0J g = (u{D + vw-i w 2 h))^{u{U + vw-i w'^fuiU + vw\ w 2 h)g (5.12) 

As U is a tree-context-type, we can apply again (|5.10p in the reverse direction in order to 
derive the desired result. 

• Consider now the case when w, w' can be decomposed as 

w = w-\ 1/1/2W3 w' = w\ 1/1/3 w \ ^W-\,W^<Ws 
with l/l/g a tree-context-type or empty. We first use the induction assumption to get 

(u(D + 1/1/1/1 w 2 w 3 h)) u g = (u(D + vw-[ w 2 w 3 h)) w u(\J + 1/1/1/1 w 2 w^h)g (5.13) 
By applying the identity (15.8j) . we get for all tree- type or empty g: 

(u(D + vw-\ w 2 w' z h)) u g = (u(D + vw-\ w 2 w^h)) w u(D + 1/1/1/1 w^h)g (5.14) 

Note that it is important here that w^h is either a tree-context-type or empty. Finally, 
we apply once again the induction assumption to get 

(u{D + 1/1/1/1 w' z h)yg = (u(D + vw-\ w^)) u u(D + vw\w' z h)g (5.15) 

As U is a tree-context type, we can first iterate on (j5. 13[) . then iterate on (|5.14p and finally 
applying (I5.15P in order to get: 

(u(0+vw-\ w 2 w 3 h)yg = 1/1/1/1 w 2 W2,h)) u (u(U+vw-i w 2 w' z h)) u {u(U+vw-i w^h))^ u{U+vw\ w z h)g 

Because U is a tree-context-type we can now apply f)5. 13j) and (|5.14p in reverse to eliminate 
the inner products and obtain the desired result. 
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• Finally, consider the case when l/l/, w' can be decomposed as 

w = □ + i/iz-i w' = □ + w\0 w\ < 1/1/1 

In this case, the identity becomes: 

(u{D + v'wiO))"g = (u{D + v'w^Q)Yu{U + v'w\0)g 

where v' = v(h + \D). The result now follows by induction assumption with 1/1/1, w \ i n place 
of w, w'. 

We now claim that all cases have been considered. Assume first that either l/l/' or 1/1/ consists 
of several trees. Then, by the definition of <, l/l/' and 1/1/ can be decomposed into smaller 
forests and we conclude using the first bullet. We can thus assume that both 1/1/ and l/l/' are 
trees. If l/l/' contains a node between its root and its hole then, by definition of <j, we can 
decompose 1/1/ and l/l/' and apply the second bullet. Similarly we can transform l/l/ using the 
first bullet until the third bullet can be applied. □ 

We now derive the first part of identity (|5,ip . Let U, V be tree-context- types such that 
V < u, and let h be a tree-type. We show by induction on V that U^h = u u vh. If V = V-\ v 2 
where both v-\ and 1/2 are tree-context-types then we consider V2 first and v-\ next: 

u u h = u UJ v 2 h = u^v-i v 2 h . 

It is important here that v 2 h is a tree-type. 

Therefore it is enough to consider the case where V is of the form a(a)(D + f) for some 
letter a and some forest-type f. In the sequel we write a instead of a(a) in order to improve 
readability. From V < u we get U = U-\a(d + g)u 2 where U-\ and U 2 are tree-context-types 
and f < g. Then we have from identity (15. 6h for any tree- type /?: 

u w /7 = (ui a(D + g)iv 2 r/7 = iv^ui a(D + g)h 

if'h = (Ui a(D + g)iv 2 r/7 = Lf'ui h 

and therefore, as a(D + g ( )/7 is a tree-type we get for any tree-type h: 

u u h= u ul a(D + g)h (5.16) 

Iterating on (I5.16P we get: 

ifh = Lfa{U + g)h = ifa{U + g) w h . 

It will therefore be enough to show 

(a(D + g)Th = (a(D + g)Ta{U + f)h 

for f < g. This, however, is a consequence of (|5.9p . 

The second part of identity (|5. 1 1) , U u = vu w , is shown the same way using identity (j5.7|) 
instead of identity (|5.6p and building on (|5.17|) below instead of (|5.9p . 

Lemma 5.13. If u is a tree-context-type, V, w, w' are (not necessarily tree) context-types 
with w' < l/l/, and fl', h are either tree-types or empty, then the following identity holds 

(u(D + vwh)) w = (u(D + vw'h){u{U + vwh)) w . (5.17) 
Proof. Identical to the proof of Lemma 15.121 applying the other side of identity (|5.8p . □ 
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6. Variations 

In this section we show that the techniques we developed in the previous sections are fairly 
robust and can be adapted to many situations. We describe some of them. 

6.1. Languages definable in Here we treat the relatively simple case of languages 
defined by H-\ sentences (rather than boolean combinations of such formulas). We will 
prove: 

Theorem 6.1. It is decidable whether a given regular forest language L is definable by a 
^l(<><d/s) sentence. 

We will show how to do this using the syntactic forest algebra and syntactic morphism, 
although this could be carried out just as well using an automaton model. The argument 
we give is based on an idea of Pin [12] concerning ordered monoids. 

Let L C /-/a be a regular forest language, and let q/_ : A A — > (Hi, Vi) be its syntactic 
morphism. We set X = oti(L) C /-//_. Note that L = ctj^ (X). For h-\ , A?2 G Hj_ we define 

h < H L h 2 

if for all V G V L , vh 2 G X implies vh^ G X. Further, for V-\ , V 2 G V 2 we define 

^ < V L v 2 

if for all h G H L , v^h<^ v 2 h. 

Proposition 6.2. The relations <^ and <^ are partial orders on H\_ and Vj_, respectively. 
These orders are compatible with the algebra operations in the sense that whenever h-\ <^ h 2 , 
U-\ <l U 2 , and V-\ <^ V 2 , we have 

Ml <l v 2l~>2, 

Proof. This is straightforward from the definitions: Transitivity and reflexivity of <^ are 
obvious. To prove antisymmetry, suppose h-\ <^ h 2 and h 2 <^ h-\. Let S-\,S 2 G Ha with 
a/_(S/) = hj. Let p G V A and set V = a/_(p). If ps 2 G L then vh 2 = a{ps 2 ) G X, so a(pSi) = 
vhi G X and thus pSi G L. Likewise pSi G L implies ps 2 G L, so Si ~/_ S 2 and thus h-\ = h 2 . 

Transitivity and reflexivity of <^ are likewise trivial, and antisymmetry follows from 
the antisymmetry of <^ and the faithfulness of the action of !//_ on H\_. 

For the multiplicative properties, let hj, U\, Vj be as in the statement of the Proposition. 
If V 2 h 2 G X, then V 2 h-\ G X (since h-\ h 2 ) and thus V-\h-\ G X (since <1 V 2 ). Thus 
^1 h\ <H v 2 h 2 . Similarly U 2 V 2 h G X implies U-\ V 2 h G X (since U-\ <fu 2 ) and thus U\ V-\ h G X 
(since v-\ <^ v 2 ) so u-\ U 2 <*[ U 2 V 2 . 

□ 
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Theorem 6.3. Let L C /-/a be a regular forest language. The following are equivalent: 

• L is definable by a £i(<, <dfs) formula. 

• For all contexts p, q and forests t, 

pt G L pqt G L 

• For all V G V L , V < V L □. 

Proof. The first condition implies the second, because inserting new nodes in a forest does 
not change the < or <df s relation among the already existing nodes. 

To show that the second condition implies the first, we use a pumping argument: Let 
n = \Hi\. There exists K > such that any forest S with at least K nodes has a factorization 

s = q : q 2 ■ ■ ■ q n t 

for some forest t, nonempty contexts Q/. In particular, there is a factorization s = pqt with 
ai(t) = ui_(qt). Thus a forest belongs to L if and only if it is obtained by successive insertion 
of nodes starting with a forest in L of size less than K. We can write a Z-| sentence <fi that 
describes all the relations among nodes of the forests of size less than K that belong to L, 
and thus this sentence defines L. 

To show the equivalence of the second and third conditions, suppose the second con- 
dition holds. We need to show V <^ □ for all V G V. This says that for every forest S 
and every context p, S G L implies ps G L, which follows from the second condition. Con- 
versely, suppose the third condition holds, and that p, q are contexts and t a forest with 
pt G L. Then ai_{pt) = a/_(p)Da:/_(f) G X. By the multiplicative properties of the partial 
order, a/.(p)a/.(q)a/.(0 G X, and thus pqt G L. □ 

Theorem 16. II is an immediate corollary, since one can effectively compute the order <^ 
given the syntactic algebra and syntactic morphism of L. 

6.2. Commutative languages. In this section we consider forest languages that are com- 
mutative, i.e., closed under rearranging siblings. 

A forest t' is called a reordering of a forest t if it is obtained from t by rearranging 
the order of siblings. In other words, reordering is the least equivalence relation on forests 
that identifies all pairs of forests of the form p(s + t) and p(t + s). A forest language is 
called commutative if it is closed under reordering. In other words, a forest language is 
commutative if and only if its syntactic forest algebra satisfies the identity 

g+h= h+g . 

We say a forest s is a commutative piece of t, if S is a piece of some reordering of t. A 
forest language L is called commutative-piecewise testable if for some flGff, membership 
of t in L depends only on the set of commutative pieces of t that have no more than n 
nodes. This definition also has a counterpart in logic, by removing the forest-order from 
the signature. The following proposition is immediate: 

Proposition 6.4. A forest language is commutative-piecewise testable iff it is definable by 
a Boolean combination o/Z-|(<) formulas. 

If a language is commutative-piecewise testable, then it is clearly commutative and 
piecewise testable (in the more powerful, noncommutative, sense). Below we show that the 
converse implication is also true: 
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Theorem 6.5. A forest language is commutative-piecewise testable if and only if it is 
commutative and piecewise testable. 

As piecewise testability is decidable, by Corollary 13.21 and commutativity is obviously 
decidable, the theorem above implies decidability: 

Corollary 6.6. It is decidable if a regular forest language is commutative-piecewise testable. 

Theorem 16.51 follows quite easily from: 

Lemma 6.7. Let n G N. For k sufficiently large, if two forests have the same commutative 
pieces of size at most k, then they can be both reordered so that the resulting forests have 
the same pieces of size at most n. 

To see this, assume L is a commutative and piecewise testable forest language. We need 
to show that there is a k such that if t and S have the same commutative pieces of size k 
then t G L iff S G L. As L is piecewise testable there exists an n such that whenever s and t 
have the same pieces of size no more than n then t G L iff S G L. Let k be the number given 
by Lemma 16.71 for that n. Assume now that S and t have the same commutative pieces of 
size k. By Lemma 16.71 thev can be reordered into respectively s' and t' such that s' and t' 
have the same pieces of size n. Hence s' G L iff t' G L. But as L is commutative this yields 
S G L iff t G L as desired. 



Proof of Lemma 6.1. Let P(s) be the set of pieces of S that have size at most n. As in 
Lemma 14.61 there is some k such that any forest S has a piece t ^ S of size at most k with 
P(s) = P(t). Let now Si , S2 be two forests with the same commutative pieces of size k. For 
/ = 1,2, consider the families 

Vj = {P(s'i) : s'i is a reordering of S/} . 

To prove the lemma, we need to show that the families V\ and V2 share a common element. 
To this end, we show that for any X G V\ , there is some Y G V2 with X C Y, and vice versa; 
in particular, the families share the same maximal elements. Let then X = P(s\ ) G V\ ■ By 
the choice of k, the forest s\ has a piece t of size at most k with P(t) = X. Therefore t is a 
commutative piece of S-\ of size k. By assumption, the forest t is also a commutative piece 
of S2 and therefore a piece of some reordering s 2 of S2. Hence X C P(s 2 ) G V?.- C 

Similarly we can define the notion of commutative-cca-piece and commutative-cca- 
piecewise testable forest language. Using the same arguments as above we can prove: 

Proposition 6.8. A forest language is commutative- cca-piecewise testable iff it is definable 
by a Boolean combination o/Z-|(n) formulas. 

Theorem 6.9. A forest language is commutative- cca-piecewise testable if and only if it is 
commutative and cca-piecewise testable. 

Corollary 6.10. It is decidable if a regular forest language is commutative- cca-piecewise 
testable. 
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6.3. Tree languages. Our previous results were provided decidable characterizations for 
forest languages, and in fact the algebraic theory used here works best when forests, rather 
than trees, are treated as the fundamental object. Traditionally, though, interest has focused 
on trees rather than forests. Thus we want to give a decidable characterization of the 
piecewise testable tree languages or, equivalently, the sets of trees that are definable by 
Boolean combinations of Z-| sentences. 

For certain logics, like first-order logic over the descendant relation, or first-order logic 
over successor, one can write a sentence that says "this forest is a tree" , and thus there is 
no need to treat tree and forest languages separately. For piecewise testability, we need to 
do something more, since the set of all trees over a finite alphabet A is not definable by a 
Boolean combination of Zi sentences over any of the predicates mentioned in this paper. 

We define a tree piecewise testable language over a finite alphabet A to be the intersec- 
tion of a piecewise testable forest language with the set of all trees over A. In other words 
this is the set of languages definable by a Boolean combination of Zi(<,<df s ) formulas 
when we interpret these formulas in trees. This is preferable to defining a piecewise testable 
tree language to be a tree language that is piecewise testable (as a forest language), since 
the latter definition would only define tree languages that are either finite or contain only 
chains (no branching) . Moreover it would not correspond to the tree languages definable by 
a Boolean combination of £i(<, <df s ) formulas. The cases when the pieces are assumed to 
be commutative and/or take into account closest common ancestor are defined analogously. 

We will obtain our decidability result by a general method for translating algebraic char- 
acterizations of classes of forest languages to characterizations of the corresponding classes 
of tree languages. This method will apply to all the cases we considered earlier: piecewise 
testable languages, cca-piecewise testable languages, and their commutative counterparts. 

First, suppose 

a : A A — > (H, V) 

is a surjective forest algebra morphism. Recall that we denote by Ha the set of all forests of 
A. Based on a, we define an equivalence relation on Ha: We write S ~ t if for all contexts 
p such that ps and pt are both trees (this happens if p is a tree-context or if p is the empty 
context and both t and S are trees) we have a(ps) = a(pt). Notice that if S and t are such 
that a(s) = a(t) then s ~ t and that if S and t are both trees then S ~ t implies a(s) = a(t) 
(take p = □ in the definition of ~). It is clear that if S ~ t then for any context q, qs ~ qt. 
Thus ~ defines a forest algebra congruence on A A . Let 

a' : A A -> (H', V) 

be the projection morphism onto the quotient by this congruence. We call a' the tree 
reduction of a. From the remark above it follows that if t and S are both trees then a(s) = 
a(t) iff a'(s) = a'(t). 

Let F be a family of forest languages over A. We say that a set T of surjective forest 
algebra morphisms with domain A A characterizes F if a forest language L belongs to F if 
and only if L is recognized by some morphism in T . We will further assume that T is closed 
in the following sense: suppose a '. A A — > (H| , V-\) belongs to J 7 , and (3 '. (H| , V-\) — > (H2, V2) 
is a morphism onto a finite forest algebra. Then (5a belongs to T. 

Theorem 6.11. Let F and T be as above, and let L C Ha be a set of trees. Then there 
is a forest language K G F such that L consists of all the trees in K if and only if the tree 
reduction of the syntactic morphism q/_ of L belongs to J 7 . 
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Proof. Let L be a tree language, oil be its syntactic morphism and let a' L '. A A — > (H' L , V[) 
be its tree reduction. 

Assume first that there is a forest language K such that L consists of all the trees in 
K. Let ax '■ A A — > (Hk, Vk) be the syntactic morphism of K. By definition, G T . Fix 
h G Hk and let t, S be forests such that a^it) = h = aK(s). We show that a' L (s) = a' L (t). 
Suppose this is not the case. Then there exists a context p such that ps and pt are both 
trees but a/_(ps) =/ ai(pt). By definition of a/_ this means that there exists a context q such 
that qps G L but qpf L From qps G L we know that Qps is a tree, hence, as pt is a tree, 
qpf must also be a tree. By hypothesis this implies qps G K but qpf K, contradicting 
= "K-(s). 

Since V' L acts faithfully on H' L , it follows that for any contexts p and Q, a/<(P) = a K.(Q) 
implies a'^p) = a' L (q). Thus = /3a/< for some morphism /3 : (Hk, Vk) — > (H' L , V' L ) sending 
h € Hk to /3(h) = a' L (a K ^(h)). By hypothesis on F this implies that a' L E J 7 . 

Conversely, suppose that a' L belongs to T. Let X = a' L (L) and set K = (a' L )~^(X). From 
the hypothesis it follows that K £ F. Assume that t is a tree such that a' L (t) S X. By 
definition of X, there is a tree S € Z. such that a' L (s) = a' L (t). But as is the tree reduction 
of a/., we have a' L (s) = a' L (t) implies a/_(s) = ai(t) and therefore t € L. Hence L is the set of 
trees of K. □ 

As a result we have: 

Corollary 6.12. It is decidable if a regular tree language is tree (commutative) (cca- 
Jpiecewise testable. 

Proof. We only give the proof for the piecewise testable case. The other cases are handled 
similarly. 

Let F be the family of piecewise testable forest languages over A, and let J- be the family 
of morphisms from A A onto finite forest algebras that satisfy the identities of Theorem 14.11 
Notice that from Proposition ^. 15l it follows that if a € J 7 then (3a G J- for all onto morphism 
(3. Hence F and J- satisfy the hypothesis of Theorem 16. Ill 

Consequently, a regular tree language L is tree piecewise testable if and only if the 
tree reduction of a/_ belongs to T. It remains to show that we can effectively compute the 
image of the tree reduction given a\_. Consider h G Hj_ and notice that all the forests in 
a^~ 1 (h) agree on a' L . Hence the procedure amounts to deciding which pairs of elements of 
the syntactic forest algebra are identified under the reduction, which we can do as long as 
we know which elements are images under a\_ of trees. It is easy to see that if an element 
of Hi is the image of a tree, then it is the image of a tree of depth at most in which 
each node has at most children, so we can effectively decide this as well. □ 

6.4. Horizontal order. We could also consider other natural predicates over forests. Re- 
call for instance the definition of horizontal- order with X </, y expresses the fact that X is 
a sibling of y occurring strictly before y in the forest-order. 

Correspondingly we say that S is a horizontal-piece of t, denoted S lh f, if there is an 
injective mapping from nodes of S to nodes of t that preserve the horizontal-order and the 
ancestor relationship. An equivalent definition is that the piece relation is the reflexive 



PIECEWISE TESTABLE TREE LANGUAGES 



31 



transitive closure of the relation 

{(pt,pat) '. p is a context, a is a node, t is a forest or empty 

and either t is empty or a does not have a sibling in pat} 

From this notion of horizontal-piece we derive the notion of horizontal-piecewise testability 
as expected and the very same proofs as in Section U] yield: 

Proposition 6.13. A forest language is horizontal-piecewise testable iff it is definable by a 
Boolean combination of Zi(</,, <dfs) formulas. 

Theorem 6.14. A forest language is horizontal-piecewise testable if and only if its syntactic 
algebra satisfies the identity 



for all U, V £ V L such that V lh U. 

This implies decidability of horizontal-piecewise testability and it would be interesting 
to see what would be the corresponding equivalent set of identities that does not make use 
of lh, in the spirit of Proposition 14.151 

A straightforward adaptation of Section [5] would also give a decidable characterization 
of definability by a Boolean combination of 5Z-|(<, <h, n). 



Simon's theorem on ^-trivial monoids has emerged as one of the fundamental results in 
the algebraic theory of automata on words. The principal contribution of the present paper 
has been to show that the use of forest algebras leads to a natural generalization of this 
theorem to trees and forests. In proving this generalization we have introduced a number 
of new techniques that we believe will prove useful in the continuing development of the 
algebraic theory of tree automata. 

Let us briefly indicate a few directions for further research. There is a purely algebraic 
formulation of Simon's theorem, stating that every finite ^7-trivial monoid M is the quotient 
of a finite monoid N that admits a partial order compatible with the multiplication in N and 
in which the identity is the maximum element. Our new results have a similar formulation: 
Every finite forest algebra satisfying the identities of Section 4 is the quotient of an algebra 
that admits compatible partial orders on both its horizontal and vertical components. In 
fact, Straubing and Therien |17j have proved this order property of finite jT-trivial monoids 
directly, yielding a quite different proof of Simon's theorem. It would be interesting to know 
whether such an argument is also possible for forest algebras. 

In the word case, the boolean combinations of Z-| -definable languages form the first level 
of hierarchy whose union is the first-order definable languages. Little is known about the 
higher levels of this hierarchy, apart from the fact that it is strict. Indeed, the problem of 
effectively characterizing the languages definable by boolean combinations of ^-sentences 
has been open for many years. In contrast, the first-order definable languages themselves 
constitute one of the first classes for which an effective algebraic characterization was given: 
these are exactly the languages whose syntactic monoids are aperiodic. (McNaughton and 
Papert |llj.) The corresponding problem for trees and forests, however, remains open: We 
possess non-effective algebraic characterizations for the forest languages definable by first- 
order sentences over the ancestor relation, and for the related subclasses CTL and CTL* 



iTv = IT = Vtf 
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(see Bojanczyk, et. al. [5]), but the problem of finding effective tests for membership of a 
language in any of these classes remains one of the greatest challenges in this work. 
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